In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following block of code will require additional functionality which you must provide. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!
Note: Once you have completed all the code implementations, you need to finalize your work by exporting the Jupyter Notebook as an HTML document. Before exporting the notebook to HTML, all the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.
The rubric contains optional "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. If you decide to pursue the "Stand Out Suggestions", you should include the code in this Jupyter notebook.
Photo sharing and photo storage services like to have location data for each photo that is uploaded. With the location data, these services can build advanced features, such as automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. Although a photo's location can often be obtained by looking at the photo's metadata, many photos uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns.
If no location metadata for an image is available, one way to infer the location is to detect and classify a discernible landmark in the image. Given the large number of landmarks across the world and the immense volume of images that are uploaded to photo sharing services, using human judgement to classify these landmarks would not be feasible.
In this notebook, you will take the first steps towards addressing this problem by building models to automatically predict the location of the image based on any landmarks depicted in the image. At the end of this project, your code will accept any user-supplied image as input and suggest the top k most relevant landmarks from 50 possible landmarks from across the world. The image below displays a potential sample output of your finished project.

We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.
Note: if you are using the Udacity workspace, YOU CAN SKIP THIS STEP. The dataset can be found in the /data folder and all required Python modules have been installed in the workspace.
Download the landmark dataset.
Unzip the folder and place it in this project's home directory, at the location /landmark_images.
Install the following Python modules:
In this step, you will create a CNN that classifies landmarks. You must create your CNN from scratch (so, you can't use transfer learning yet!), and you must attain a test accuracy of at least 20%.
Although 20% may seem low at first glance, it seems more reasonable after realizing how difficult of a problem this is. Many times, an image that is taken at a landmark captures a fairly mundane image of an animal or plant, like in the following picture.

Just by looking at that image alone, would you have been able to guess that it was taken at the Haleakalā National Park in Hawaii?
An accuracy of 20% is significantly better than random guessing, which would provide an accuracy of just 2%. In Step 2 of this notebook, you will have the opportunity to greatly improve accuracy by using transfer learning to create a CNN.
Remember that practice is far ahead of theory in deep learning. Experiment with many different architectures, and trust your intuition. And, of course, have fun!
Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.
All three of your data loaders should be accessible via a dictionary named loaders_scratch. Your train data loader should be at loaders_scratch['train'], your validation data loader should be at loaders_scratch['valid'], and your test data loader should be at loaders_scratch['test'].
You may find this documentation on custom datasets to be a useful resource. If you are interested in augmenting your training and/or validation data, check out the wide variety of transforms!
#
# All module imports for the project
#
import os, sys
from collections import OrderedDict
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import torch
from torch import nn, optim
import torchvision as tv
import cv2
import PIL
from PIL import Image
print("==== Software versions ====")
print("Python :", sys.version.split()[0])
print("NumPy :", np.__version__)
print("MatPlotLib :", matplotlib.__version__)
print("PyTorch :", torch.__version__)
print("TorchVision :", tv.__version__)
print("cv2 :", cv2.__version__)
print("PIL :", PIL.__version__)
==== Software versions ==== Python : 3.8.8 NumPy : 1.20.2 MatPlotLib : 3.3.4 PyTorch : 1.8.1+rocm4.0.1 TorchVision : 0.9.1+cu102 cv2 : 4.5.2 PIL : 8.2.0
The above shows the version numbers of the software installed on my computer. The notebook was also tested in the Udacity workspace, with the following software versions:
Python : 3.6.3
NumPy : 1.12.1
MatPlotLib : 2.1.0
PyTorch : 0.4.0
TorchVision : 0.2.1
cv2 : 3.3.1
PIL : 5.2.0
### Done: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes
#
# Parameters
#
# Path to image dataset; must end with '/'
IMAGES_DIR = '../data/landmark_images/'
# Fraction of training images to use for validation
VALIDATION_FRAC = 0.2
# Width and height of input image in pixels after transforms
TRANSFORM_WIDTH_HEIGHT = 128
# Parameters for the color normalization step.
# The choice for these parameters was informed by looking at the average
# mean and standard deviation from a random sample of images in the dataset.
#
TRANSFORM_MEAN = [0.5, 0.5, 0.5]
TRANSFORM_STD = [0.25, 0.25, 0.25]
# Width of ignored border region in pixels, after scaling.
# Image is first resized to `TRANSFORM_WIDTH_HEIGHT + DISCARD_BORDER * 2`,
# then cropped to `TRANSFORM_WIDTH_HEIGHT`.
DISCARD_BORDER = 8
# Number of images in a batch
BATCH_SIZE = 32
# Number of image loading workers (0 = use main process)
# Keep this at 0 if using the Udacity workspace.
NUM_LOADER_WORKERS = 0
#
# Transforms
#
# Image transform pipeline for training.
#
train_transform = tv.transforms.Compose([
# Resize first, to reduce the amount of data for the next steps.
tv.transforms.RandomResizedCrop(int(TRANSFORM_WIDTH_HEIGHT*1.2)),
# Random rotation.
# (The `resample` parameter is deprecated, but needed for compatibility with the Udacity workspace.)
tv.transforms.RandomRotation(degrees=10, resample=Image.BILINEAR),
# Crop to final size; the final crop region will be completely inside the rotated image, without
# showing any part of the black triangles created by the rotation.
tv.transforms.CenterCrop(TRANSFORM_WIDTH_HEIGHT),
# Color jitter (weather / time of day)
tv.transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.05, hue=0.01),
# Convert to normalized floating point
tv.transforms.ToTensor(),
tv.transforms.Normalize(mean=TRANSFORM_MEAN, std=TRANSFORM_STD)
])
# Deterministic image transform pipeline for validation and testing.
#
eval_transform = tv.transforms.Compose([
tv.transforms.Resize(TRANSFORM_WIDTH_HEIGHT + DISCARD_BORDER * 2),
tv.transforms.CenterCrop(TRANSFORM_WIDTH_HEIGHT),
tv.transforms.ToTensor(),
tv.transforms.Normalize(mean=TRANSFORM_MEAN, std=TRANSFORM_STD)
])
#
# Datasets and samplers
#
# Training, validation, and testing datasets
train_dataset = tv.datasets.ImageFolder(IMAGES_DIR + 'train', transform=train_transform)
valid_dataset = tv.datasets.ImageFolder(IMAGES_DIR + 'train', transform=eval_transform)
test_dataset = tv.datasets.ImageFolder(IMAGES_DIR + 'test', transform=eval_transform)
# Get the index-to-class mapping
classes_dict = train_dataset.classes
# Pick a random subset of the training images for validation
all_indices = list(range(len(train_dataset)))
np.random.shuffle(all_indices)
split = int(np.floor(len(train_dataset) * VALIDATION_FRAC))
valid_idx, train_idx = all_indices[:split], all_indices[split:]
# Define samplers for training and validation.
# Also randomize the order in which the images are presented for training.
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idx)
valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(valid_idx)
# Remember the sizes of the training and validation sets.
# (`len(loader.dataset)` cannot be used; it does not take the samplers into account.)
train_dataset_size = len(train_sampler)
valid_dataset_size = len(valid_sampler)
#
# Image loaders
#
loaders_scratch = {
'train': torch.utils.data.DataLoader(
train_dataset,
batch_size = BATCH_SIZE,
sampler = train_sampler,
num_workers = NUM_LOADER_WORKERS
),
'valid': torch.utils.data.DataLoader(
valid_dataset,
batch_size = BATCH_SIZE,
sampler = valid_sampler,
num_workers = NUM_LOADER_WORKERS
),
'test': torch.utils.data.DataLoader(
test_dataset,
batch_size = BATCH_SIZE,
num_workers = NUM_LOADER_WORKERS
),
}
# Print the dataset sizes (rounded up to a whole number of batches) for sanity checking
for name, loader in loaders_scratch.items():
print(name + ":", BATCH_SIZE * len(loader))
train: 4000 valid: 1024 test: 1280
/home/rik/Documents/local/src/udacity_ml/anaconda/anaconda/lib/python3.8/site-packages/torchvision/transforms/transforms.py:1200: UserWarning: Argument resample is deprecated and will be removed since v0.10.0. Please, use interpolation instead warnings.warn(
Question 1: Describe your chosen procedure for preprocessing the data.
Answer:
The input tensor size is 128x128 pixels. This was my first choice, because it is relatively small, so the network can be small as well, which results in faster training. (VGG19, which is a much bigger network, uses only a moderately bigger input tensor of 224x224.)
This size turned out to preserve enough detail to work well, so I kept it.
The training dataset is augmented with random transforms:
The validation and test datasets are transformed in a deterministic way, to get accurate and consistent results. The image is first resized to a 160x160 square (and cropped where necessary), then it is cropped to 128x128. This was done to give more emphasis to the main area of the picture, and exclude irrelevant objects at the edges. (Many pictures have cars, foliage, people, or other distractions partially visible near the edge of the frame.)
Use the code cell below to retrieve a batch of images from your train data loader, display at least 5 images simultaneously, and label each displayed image with its class name (e.g., "Golden Gate Bridge").
Visualizing the output of your data loader is a great way to ensure that your data loading and preprocessing are working as expected.
import matplotlib.pyplot as plt
%matplotlib inline
## Done: visualize a batch of the train data loader
PLOT_ROWS = 2
PLOT_COLUMNS = 5
PLOT_FIGURE_SIZE = (16, 8)
assert(PLOT_ROWS * PLOT_COLUMNS <= BATCH_SIZE)
## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)
# Convert and plot an image on the current pyplot figure
# `image` is a numpy array with the inner 3 dimensions containing color, width, and height respectively.
def plot_image(image):
# Rearrange the dimensions from tensor (channels, width, height) to pyplot (width, height, channels).
image = np.transpose(image, (1, 2, 0))
# Reverse the color normalization
image = image * TRANSFORM_STD + TRANSFORM_MEAN
# Plot the image
plt.imshow(image)
images, labels = iter(loaders_scratch['train']).next()
images = images.numpy()
fig = plt.figure(figsize=PLOT_FIGURE_SIZE)
for i in np.arange(PLOT_ROWS * PLOT_COLUMNS):
ax = fig.add_subplot(PLOT_ROWS, PLOT_COLUMNS, i+1, xticks=(), yticks=())
plot_image(images[i])
ax.set_title(classes_dict[labels[i]])
# useful variable that tells us whether we should use the GPU
use_cuda = torch.cuda.is_available()
Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_scratch, and fill in the function get_optimizer_scratch below.
## Done: select loss function
# I like using LogSoftmax for the output activation function, because it is more tidy.
# (all probabilities sum to 1.0)
# To match this, the criterion needs to be negative log likelihood loss.
criterion_scratch = nn.NLLLoss()
def get_optimizer_scratch(model):
## Done: select and return an optimizer
# The Adam algorithm appears to work quite well here.
return optim.Adam(model.parameters(), lr=0.001)
Create a CNN to classify images of landmarks. Use the template in the code cell below.
import torch.nn as nn
# define the CNN architecture
class Net(nn.Module):
## Done: choose an architecture, and complete the class
# Feature count of each convolutional layer
conv_layers = [8, 16, 32]
# Output node count for each classifier layer
class_layers = [2048, 512, 50]
# Dropout probabilities
DROPOUT_CONV = 0.2
DROPOUT_FC = 0.3
# Create a pooling layer; returns (new_size, module)
def get_pool_layer(self, size):
return size // 2, nn.MaxPool2d(
kernel_size = 2,
stride = 2
)
def __init__(self):
super(Net, self).__init__()
## Define layers of a CNN
# Create a `conv_module` sequence containing the convolutional layers.
#
image_size = TRANSFORM_WIDTH_HEIGHT
in_features = 3 # Start with 3 'features' (RGB color channels)
stacks = []
for out_features in self.conv_layers:
layers = OrderedDict()
# Convolutional layer
layers['cv'] = nn.Conv2d(
in_channels = in_features,
out_channels = out_features,
kernel_size = 3,
padding = 1
)
# Set `in_features` for the next layer
in_features = out_features
# Activation function
layers['act'] = nn.ReLU()
# Pooling layer
image_size, pool_layer = self.get_pool_layer(image_size)
layers['pool'] = pool_layer
stacks.append(nn.Sequential(layers))
self.conv_module = nn.Sequential(*stacks)
# 2D dropout; applied after the last convolutional stack
#
self.drop2d = nn.Dropout2d(p=self.DROPOUT_CONV)
# Create a `classifier` module containing the fully connected layers.
#
in_nodes = in_features * image_size ** 2
layers = OrderedDict()
for layer_i, out_nodes in enumerate(self.class_layers, start=1):
name = f'l{layer_i}'
# Fully connected layer
layers[name] = nn.Linear(in_nodes, out_nodes)
# Set `in_nodes` for the next layer
in_nodes = out_nodes
# Use a LogSoftMax activation function for the final layer,
# and ReLU plus dropout for the others.
if layer_i < len(self.class_layers):
layers[name + 'act'] = nn.ReLU()
layers[name + 'drop'] = nn.Dropout(p=self.DROPOUT_FC)
else:
layers[name + 'act'] = nn.LogSoftmax(dim=1)
self.classifier = nn.Sequential(layers)
def forward(self, x):
## Define forward behavior
#
x = self.conv_module(x)
x = self.drop2d(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
#-#-# Do NOT modify the code below this line. #-#-#
# instantiate the CNN
model_scratch = Net()
# move tensors to GPU if CUDA is available
if use_cuda:
model_scratch.cuda()
Question 2: Outline the steps you took to get to your final CNN architecture and your reasoning at each step.
Answer:
I started with a CNN consisting of several convolutional layer stacks, followed by a series of fully connected layers. Initially, the input transform stack did not have the color jitter step yet; that was added later.
Each convolutional stack contained:
Each fully connected layer had the following properties:
The result was not very good. This may be due to the network being too small for this task.
Increase the size and number of layers.
This change showed no material improvement.
Increase the size again. Also remove the dropouts between the convolutional stacks, as they may filter out too much information for the earlier layers to train well.
The result was very similar to attempt 3.
Maybe the initial problem was the presence of the many dropouts, rather than the size of the network. It may be the case that the network is now too big, and is suffering from a vanishing gradient problem. Try again with a much smaller network.
This result was much better; it attained a test accuracy of 40%.
The result after these final adjustments is the one shown below in the notebook.
Implement your training algorithm in the code cell below. Save the final model parameters at the filepath stored in the variable save_path.
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
"""returns trained model"""
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf
for epoch in range(1, n_epochs+1):
# initialize variables to monitor training and validation loss
train_loss_accum = 0.0
train_loss = 0.0
valid_loss_accum = 0.0
valid_loss = 0.0
###################
# train the model #
###################
# set the module to training mode
model.train()
for batch_idx, (data, target) in enumerate(loaders['train']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## Done: find the loss and update the model parameters accordingly
## record the average training loss, using something like
## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - train_loss))
# Reset gradients
optimizer.zero_grad()
# Compute the class predictions and loss
log_ps = model(data)
loss = criterion(log_ps, target)
# Accumulate training loss.
# The loss is an average for this batch, so multiply it with the batch size
# to get a more accurate average loss for the whole dataset.
train_loss_accum += loss.item() * data.size(0)
# Perform an optimization step.
loss.backward()
optimizer.step()
# Get the average training loss by dividing by the total number of images.
train_loss = train_loss_accum / train_dataset_size
######################
# validate the model #
######################
# set the model to evaluation mode
model.eval()
with torch.no_grad():
for batch_idx, (data, target) in enumerate(loaders['valid']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## Done: update average validation loss
log_ps = model(data)
loss = criterion(log_ps, target)
# Accumulate validation loss.
# The loss is an average for this batch, so multiply it with the batch size
# to get a more accurate average loss for the whole dataset.
valid_loss_accum += loss.item() * data.size(0)
# Get the average validation loss by dividing by the total number of images.
valid_loss = valid_loss_accum / valid_dataset_size
# print training/validation statistics
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch,
train_loss,
valid_loss
))
## Done: if the validation loss has decreased, save the model at the filepath stored in save_path
if valid_loss < valid_loss_min:
print(f'\tValidation loss decreased: ({valid_loss_min:0.6f} -> {valid_loss:0.6f}); saving state.')
torch.save(model.state_dict(), save_path)
valid_loss_min = valid_loss
return model
Use the code cell below to define a custom weight initialization, and then train with your weight initialization for a few epochs. Make sure that neither the training loss nor validation loss is nan.
Later on, you will be able to see how this compares to training with PyTorch's default weight initialization.
def custom_weight_init(m):
## Done: implement a weight initialization strategy
# Trying a normal instead of uniform distribution for the weights,
# and very small uniform random values for the bias.
if isinstance(m, (nn.Linear, nn.Conv2d)):
y = 1 / np.sqrt(m.weight.data.numel())
m.weight.data.normal_(0, y)
m.bias.data.uniform_(-0.01*y, 0.01*y)
#-#-# Do NOT modify the code below this line. #-#-#
model_scratch.apply(custom_weight_init)
model_scratch = train(20, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch),
criterion_scratch, use_cuda, 'ignore.pt')
Epoch: 1 Training Loss: 3.901040 Validation Loss: 3.839182 Validation loss decreased: (inf -> 3.839182); saving state. Epoch: 2 Training Loss: 3.831972 Validation Loss: 3.754531 Validation loss decreased: (3.839182 -> 3.754531); saving state. Epoch: 3 Training Loss: 3.779134 Validation Loss: 3.725163 Validation loss decreased: (3.754531 -> 3.725163); saving state. Epoch: 4 Training Loss: 3.760922 Validation Loss: 3.719596 Validation loss decreased: (3.725163 -> 3.719596); saving state. Epoch: 5 Training Loss: 3.737019 Validation Loss: 3.696958 Validation loss decreased: (3.719596 -> 3.696958); saving state. Epoch: 6 Training Loss: 3.727356 Validation Loss: 3.669838 Validation loss decreased: (3.696958 -> 3.669838); saving state. Epoch: 7 Training Loss: 3.708845 Validation Loss: 3.630042 Validation loss decreased: (3.669838 -> 3.630042); saving state. Epoch: 8 Training Loss: 3.677196 Validation Loss: 3.629888 Validation loss decreased: (3.630042 -> 3.629888); saving state. Epoch: 9 Training Loss: 3.662066 Validation Loss: 3.618767 Validation loss decreased: (3.629888 -> 3.618767); saving state. Epoch: 10 Training Loss: 3.633771 Validation Loss: 3.573365 Validation loss decreased: (3.618767 -> 3.573365); saving state. Epoch: 11 Training Loss: 3.629611 Validation Loss: 3.533444 Validation loss decreased: (3.573365 -> 3.533444); saving state. Epoch: 12 Training Loss: 3.579576 Validation Loss: 3.496405 Validation loss decreased: (3.533444 -> 3.496405); saving state. Epoch: 13 Training Loss: 3.578005 Validation Loss: 3.521892 Epoch: 14 Training Loss: 3.548417 Validation Loss: 3.456110 Validation loss decreased: (3.496405 -> 3.456110); saving state. Epoch: 15 Training Loss: 3.532484 Validation Loss: 3.436028 Validation loss decreased: (3.456110 -> 3.436028); saving state. Epoch: 16 Training Loss: 3.499780 Validation Loss: 3.409051 Validation loss decreased: (3.436028 -> 3.409051); saving state. Epoch: 17 Training Loss: 3.497592 Validation Loss: 3.387580 Validation loss decreased: (3.409051 -> 3.387580); saving state. Epoch: 18 Training Loss: 3.482498 Validation Loss: 3.370907 Validation loss decreased: (3.387580 -> 3.370907); saving state. Epoch: 19 Training Loss: 3.463479 Validation Loss: 3.327892 Validation loss decreased: (3.370907 -> 3.327892); saving state. Epoch: 20 Training Loss: 3.429636 Validation Loss: 3.341362
Run the next code cell to train your model.
## Done: you may change the number of epochs if you'd like,
## but changing it is not required
#
# Both the training and validation loss sometimes keep decreasing after 100 epochs.
# Maybe training for a bit longer can bring a small improvement.
num_epochs = 150
#-#-# Do NOT modify the code below this line. #-#-#
# function to re-initialize a model with pytorch's default weight initialization
def default_weight_init(m):
reset_parameters = getattr(m, 'reset_parameters', None)
if callable(reset_parameters):
m.reset_parameters()
# reset the model parameters
model_scratch.apply(default_weight_init)
# train the model
model_scratch = train(num_epochs, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch),
criterion_scratch, use_cuda, 'model_scratch.pt')
Epoch: 1 Training Loss: 3.899335 Validation Loss: 3.847374 Validation loss decreased: (inf -> 3.847374); saving state. Epoch: 2 Training Loss: 3.836148 Validation Loss: 3.780293 Validation loss decreased: (3.847374 -> 3.780293); saving state. Epoch: 3 Training Loss: 3.771421 Validation Loss: 3.675713 Validation loss decreased: (3.780293 -> 3.675713); saving state. Epoch: 4 Training Loss: 3.716303 Validation Loss: 3.623645 Validation loss decreased: (3.675713 -> 3.623645); saving state. Epoch: 5 Training Loss: 3.648545 Validation Loss: 3.539353 Validation loss decreased: (3.623645 -> 3.539353); saving state. Epoch: 6 Training Loss: 3.594820 Validation Loss: 3.434169 Validation loss decreased: (3.539353 -> 3.434169); saving state. Epoch: 7 Training Loss: 3.533057 Validation Loss: 3.382826 Validation loss decreased: (3.434169 -> 3.382826); saving state. Epoch: 8 Training Loss: 3.477527 Validation Loss: 3.359632 Validation loss decreased: (3.382826 -> 3.359632); saving state. Epoch: 9 Training Loss: 3.433324 Validation Loss: 3.296151 Validation loss decreased: (3.359632 -> 3.296151); saving state. Epoch: 10 Training Loss: 3.384177 Validation Loss: 3.223458 Validation loss decreased: (3.296151 -> 3.223458); saving state. Epoch: 11 Training Loss: 3.321819 Validation Loss: 3.194892 Validation loss decreased: (3.223458 -> 3.194892); saving state. Epoch: 12 Training Loss: 3.295227 Validation Loss: 3.144113 Validation loss decreased: (3.194892 -> 3.144113); saving state. Epoch: 13 Training Loss: 3.267806 Validation Loss: 3.124246 Validation loss decreased: (3.144113 -> 3.124246); saving state. Epoch: 14 Training Loss: 3.207781 Validation Loss: 3.061724 Validation loss decreased: (3.124246 -> 3.061724); saving state. Epoch: 15 Training Loss: 3.179238 Validation Loss: 3.008696 Validation loss decreased: (3.061724 -> 3.008696); saving state. Epoch: 16 Training Loss: 3.095997 Validation Loss: 2.969356 Validation loss decreased: (3.008696 -> 2.969356); saving state. Epoch: 17 Training Loss: 3.067133 Validation Loss: 2.904145 Validation loss decreased: (2.969356 -> 2.904145); saving state. Epoch: 18 Training Loss: 3.068944 Validation Loss: 2.902626 Validation loss decreased: (2.904145 -> 2.902626); saving state. Epoch: 19 Training Loss: 3.023323 Validation Loss: 2.858214 Validation loss decreased: (2.902626 -> 2.858214); saving state. Epoch: 20 Training Loss: 2.988289 Validation Loss: 2.822849 Validation loss decreased: (2.858214 -> 2.822849); saving state. Epoch: 21 Training Loss: 2.949840 Validation Loss: 2.727102 Validation loss decreased: (2.822849 -> 2.727102); saving state. Epoch: 22 Training Loss: 2.955485 Validation Loss: 2.732851 Epoch: 23 Training Loss: 2.904690 Validation Loss: 2.709780 Validation loss decreased: (2.727102 -> 2.709780); saving state. Epoch: 24 Training Loss: 2.873222 Validation Loss: 2.706371 Validation loss decreased: (2.709780 -> 2.706371); saving state. Epoch: 25 Training Loss: 2.860411 Validation Loss: 2.690273 Validation loss decreased: (2.706371 -> 2.690273); saving state. Epoch: 26 Training Loss: 2.818810 Validation Loss: 2.652607 Validation loss decreased: (2.690273 -> 2.652607); saving state. Epoch: 27 Training Loss: 2.795276 Validation Loss: 2.641615 Validation loss decreased: (2.652607 -> 2.641615); saving state. Epoch: 28 Training Loss: 2.778664 Validation Loss: 2.632715 Validation loss decreased: (2.641615 -> 2.632715); saving state. Epoch: 29 Training Loss: 2.765707 Validation Loss: 2.566048 Validation loss decreased: (2.632715 -> 2.566048); saving state. Epoch: 30 Training Loss: 2.733282 Validation Loss: 2.601225 Epoch: 31 Training Loss: 2.717069 Validation Loss: 2.606453 Epoch: 32 Training Loss: 2.721450 Validation Loss: 2.560993 Validation loss decreased: (2.566048 -> 2.560993); saving state. Epoch: 33 Training Loss: 2.658984 Validation Loss: 2.523309 Validation loss decreased: (2.560993 -> 2.523309); saving state. Epoch: 34 Training Loss: 2.698885 Validation Loss: 2.519456 Validation loss decreased: (2.523309 -> 2.519456); saving state. Epoch: 35 Training Loss: 2.634358 Validation Loss: 2.604380 Epoch: 36 Training Loss: 2.644578 Validation Loss: 2.476175 Validation loss decreased: (2.519456 -> 2.476175); saving state. Epoch: 37 Training Loss: 2.618167 Validation Loss: 2.522045 Epoch: 38 Training Loss: 2.598864 Validation Loss: 2.551885 Epoch: 39 Training Loss: 2.664760 Validation Loss: 2.501117 Epoch: 40 Training Loss: 2.586813 Validation Loss: 2.477528 Epoch: 41 Training Loss: 2.586626 Validation Loss: 2.449684 Validation loss decreased: (2.476175 -> 2.449684); saving state. Epoch: 42 Training Loss: 2.578539 Validation Loss: 2.472177 Epoch: 43 Training Loss: 2.563312 Validation Loss: 2.436947 Validation loss decreased: (2.449684 -> 2.436947); saving state. Epoch: 44 Training Loss: 2.535825 Validation Loss: 2.414509 Validation loss decreased: (2.436947 -> 2.414509); saving state. Epoch: 45 Training Loss: 2.499512 Validation Loss: 2.410908 Validation loss decreased: (2.414509 -> 2.410908); saving state. Epoch: 46 Training Loss: 2.517632 Validation Loss: 2.427469 Epoch: 47 Training Loss: 2.484260 Validation Loss: 2.414692 Epoch: 48 Training Loss: 2.457306 Validation Loss: 2.449129 Epoch: 49 Training Loss: 2.502186 Validation Loss: 2.422275 Epoch: 50 Training Loss: 2.496033 Validation Loss: 2.352331 Validation loss decreased: (2.410908 -> 2.352331); saving state. Epoch: 51 Training Loss: 2.473963 Validation Loss: 2.411161 Epoch: 52 Training Loss: 2.421796 Validation Loss: 2.418884 Epoch: 53 Training Loss: 2.448790 Validation Loss: 2.371706 Epoch: 54 Training Loss: 2.459541 Validation Loss: 2.327170 Validation loss decreased: (2.352331 -> 2.327170); saving state. Epoch: 55 Training Loss: 2.393109 Validation Loss: 2.393359 Epoch: 56 Training Loss: 2.407450 Validation Loss: 2.424933 Epoch: 57 Training Loss: 2.384893 Validation Loss: 2.332342 Epoch: 58 Training Loss: 2.377339 Validation Loss: 2.393293 Epoch: 59 Training Loss: 2.365554 Validation Loss: 2.325413 Validation loss decreased: (2.327170 -> 2.325413); saving state. Epoch: 60 Training Loss: 2.361472 Validation Loss: 2.324012 Validation loss decreased: (2.325413 -> 2.324012); saving state. Epoch: 61 Training Loss: 2.338151 Validation Loss: 2.345835 Epoch: 62 Training Loss: 2.377606 Validation Loss: 2.342189 Epoch: 63 Training Loss: 2.326020 Validation Loss: 2.321831 Validation loss decreased: (2.324012 -> 2.321831); saving state. Epoch: 64 Training Loss: 2.291445 Validation Loss: 2.315025 Validation loss decreased: (2.321831 -> 2.315025); saving state. Epoch: 65 Training Loss: 2.357487 Validation Loss: 2.298159 Validation loss decreased: (2.315025 -> 2.298159); saving state. Epoch: 66 Training Loss: 2.326566 Validation Loss: 2.359640 Epoch: 67 Training Loss: 2.282244 Validation Loss: 2.340591 Epoch: 68 Training Loss: 2.301509 Validation Loss: 2.368995 Epoch: 69 Training Loss: 2.269802 Validation Loss: 2.332227 Epoch: 70 Training Loss: 2.315059 Validation Loss: 2.288169 Validation loss decreased: (2.298159 -> 2.288169); saving state. Epoch: 71 Training Loss: 2.275160 Validation Loss: 2.323914 Epoch: 72 Training Loss: 2.250338 Validation Loss: 2.290810 Epoch: 73 Training Loss: 2.262610 Validation Loss: 2.269728 Validation loss decreased: (2.288169 -> 2.269728); saving state. Epoch: 74 Training Loss: 2.250010 Validation Loss: 2.263656 Validation loss decreased: (2.269728 -> 2.263656); saving state. Epoch: 75 Training Loss: 2.248517 Validation Loss: 2.406200 Epoch: 76 Training Loss: 2.263829 Validation Loss: 2.317054 Epoch: 77 Training Loss: 2.200851 Validation Loss: 2.241054 Validation loss decreased: (2.263656 -> 2.241054); saving state. Epoch: 78 Training Loss: 2.241431 Validation Loss: 2.293105 Epoch: 79 Training Loss: 2.226943 Validation Loss: 2.323175 Epoch: 80 Training Loss: 2.187220 Validation Loss: 2.371150 Epoch: 81 Training Loss: 2.246320 Validation Loss: 2.303529 Epoch: 82 Training Loss: 2.157689 Validation Loss: 2.378420 Epoch: 83 Training Loss: 2.220291 Validation Loss: 2.302539 Epoch: 84 Training Loss: 2.183995 Validation Loss: 2.291345 Epoch: 85 Training Loss: 2.200101 Validation Loss: 2.223583 Validation loss decreased: (2.241054 -> 2.223583); saving state. Epoch: 86 Training Loss: 2.191945 Validation Loss: 2.312970 Epoch: 87 Training Loss: 2.172434 Validation Loss: 2.295572 Epoch: 88 Training Loss: 2.221945 Validation Loss: 2.250891 Epoch: 89 Training Loss: 2.137854 Validation Loss: 2.260082 Epoch: 90 Training Loss: 2.167188 Validation Loss: 2.246317 Epoch: 91 Training Loss: 2.133981 Validation Loss: 2.241277 Epoch: 92 Training Loss: 2.144203 Validation Loss: 2.242751 Epoch: 93 Training Loss: 2.144478 Validation Loss: 2.229555 Epoch: 94 Training Loss: 2.096995 Validation Loss: 2.354661 Epoch: 95 Training Loss: 2.103531 Validation Loss: 2.259861 Epoch: 96 Training Loss: 2.085585 Validation Loss: 2.297113 Epoch: 97 Training Loss: 2.119055 Validation Loss: 2.287510 Epoch: 98 Training Loss: 2.070248 Validation Loss: 2.339523 Epoch: 99 Training Loss: 2.073824 Validation Loss: 2.290658 Epoch: 100 Training Loss: 2.071812 Validation Loss: 2.260773 Epoch: 101 Training Loss: 2.075973 Validation Loss: 2.266833 Epoch: 102 Training Loss: 2.104458 Validation Loss: 2.317105 Epoch: 103 Training Loss: 2.060086 Validation Loss: 2.226689 Epoch: 104 Training Loss: 2.061575 Validation Loss: 2.271826 Epoch: 105 Training Loss: 2.083307 Validation Loss: 2.338771 Epoch: 106 Training Loss: 2.087151 Validation Loss: 2.267190 Epoch: 107 Training Loss: 2.060855 Validation Loss: 2.288147 Epoch: 108 Training Loss: 2.024369 Validation Loss: 2.256384 Epoch: 109 Training Loss: 2.044462 Validation Loss: 2.259928 Epoch: 110 Training Loss: 2.053152 Validation Loss: 2.244729 Epoch: 111 Training Loss: 1.989267 Validation Loss: 2.234622 Epoch: 112 Training Loss: 2.004123 Validation Loss: 2.301481 Epoch: 113 Training Loss: 1.995962 Validation Loss: 2.259017 Epoch: 114 Training Loss: 2.023939 Validation Loss: 2.273876 Epoch: 115 Training Loss: 1.997770 Validation Loss: 2.291287 Epoch: 116 Training Loss: 1.979389 Validation Loss: 2.264531 Epoch: 117 Training Loss: 1.969527 Validation Loss: 2.271965 Epoch: 118 Training Loss: 1.991921 Validation Loss: 2.229649 Epoch: 119 Training Loss: 1.999476 Validation Loss: 2.196682 Validation loss decreased: (2.223583 -> 2.196682); saving state. Epoch: 120 Training Loss: 1.981184 Validation Loss: 2.312677 Epoch: 121 Training Loss: 1.978859 Validation Loss: 2.281926 Epoch: 122 Training Loss: 1.965787 Validation Loss: 2.257266 Epoch: 123 Training Loss: 1.995190 Validation Loss: 2.290146 Epoch: 124 Training Loss: 1.997304 Validation Loss: 2.230324 Epoch: 125 Training Loss: 1.961630 Validation Loss: 2.281488 Epoch: 126 Training Loss: 1.956995 Validation Loss: 2.252753 Epoch: 127 Training Loss: 1.923779 Validation Loss: 2.261373 Epoch: 128 Training Loss: 1.951471 Validation Loss: 2.183281 Validation loss decreased: (2.196682 -> 2.183281); saving state. Epoch: 129 Training Loss: 1.955444 Validation Loss: 2.260079 Epoch: 130 Training Loss: 1.933264 Validation Loss: 2.245733 Epoch: 131 Training Loss: 1.937916 Validation Loss: 2.253158 Epoch: 132 Training Loss: 1.934735 Validation Loss: 2.217683 Epoch: 133 Training Loss: 1.978676 Validation Loss: 2.268979 Epoch: 134 Training Loss: 1.902112 Validation Loss: 2.188208 Epoch: 135 Training Loss: 1.916661 Validation Loss: 2.238514 Epoch: 136 Training Loss: 1.970755 Validation Loss: 2.275482 Epoch: 137 Training Loss: 1.957368 Validation Loss: 2.287369 Epoch: 138 Training Loss: 1.955945 Validation Loss: 2.235814 Epoch: 139 Training Loss: 1.936085 Validation Loss: 2.252082 Epoch: 140 Training Loss: 1.935533 Validation Loss: 2.276308 Epoch: 141 Training Loss: 1.944268 Validation Loss: 2.268088 Epoch: 142 Training Loss: 1.901783 Validation Loss: 2.298959 Epoch: 143 Training Loss: 1.881035 Validation Loss: 2.337479 Epoch: 144 Training Loss: 1.901490 Validation Loss: 2.230447 Epoch: 145 Training Loss: 1.927151 Validation Loss: 2.306324 Epoch: 146 Training Loss: 1.832469 Validation Loss: 2.300926 Epoch: 147 Training Loss: 1.907862 Validation Loss: 2.287448 Epoch: 148 Training Loss: 1.892336 Validation Loss: 2.317176 Epoch: 149 Training Loss: 1.901571 Validation Loss: 2.225502 Epoch: 150 Training Loss: 1.869667 Validation Loss: 2.247699
Run the code cell below to try out your model on the test dataset of landmark images. Run the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 20%.
def test(loaders, model, criterion, use_cuda):
# monitor test loss and accuracy
test_loss = 0.
correct = 0.
total = 0.
# set the model to evaluation mode
model.eval()
for batch_idx, (data, target) in enumerate(loaders['test']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update average test loss
test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - test_loss))
# convert output probabilities to predicted class
pred = output.data.max(1, keepdim=True)[1]
# compare predictions to true label
correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total += data.size(0)
print('Test Loss: {:.6f}\n'.format(test_loss))
print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
100. * correct / total, correct, total))
# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)
Test Loss: 2.153708 Test Accuracy: 46% (575/1250)
You will now use transfer learning to create a CNN that can identify landmarks from images. Your CNN must attain at least 60% accuracy on the test set.
Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.
All three of your data loaders should be accessible via a dictionary named loaders_transfer. Your train data loader should be at loaders_transfer['train'], your validation data loader should be at loaders_transfer['valid'], and your test data loader should be at loaders_transfer['test'].
If you like, you are welcome to use the same data loaders from the previous step, when you created a CNN from scratch.
### Done: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes
#
# Updated parameters
#
# Width and height of input image in pixels after transforms
# This must match the chosen base architecture (VGG19).
#
TRANSFORM_WIDTH_HEIGHT = 224
# Parameters for the color normalization step.
# Match the parameters used for the pre-trained Torchvision models.
#
TRANSFORM_MEAN = [0.485, 0.456, 0.406]
TRANSFORM_STD = [0.229, 0.224, 0.225]
# Width of ignored border region in pixels, after scaling.
# Image is first resized to `TRANSFORM_WIDTH_HEIGHT + DISCARD_BORDER * 2`,
# then cropped to `TRANSFORM_WIDTH_HEIGHT`.
DISCARD_BORDER = 16
#
# Transforms
#
# Image transform pipeline for training.
# Use the same augmentation approach as with the from-scratch model.
#
train_transform = tv.transforms.Compose([
tv.transforms.RandomResizedCrop(int(TRANSFORM_WIDTH_HEIGHT*1.2)),
tv.transforms.RandomRotation(degrees=10, resample=Image.BILINEAR),
tv.transforms.CenterCrop(TRANSFORM_WIDTH_HEIGHT),
tv.transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.05, hue=0.01),
tv.transforms.ToTensor(),
tv.transforms.Normalize(mean=TRANSFORM_MEAN, std=TRANSFORM_STD)
])
# Deterministic image transform pipeline for validation and testing.
#
eval_transform = tv.transforms.Compose([
tv.transforms.Resize(TRANSFORM_WIDTH_HEIGHT + DISCARD_BORDER * 2),
tv.transforms.CenterCrop(TRANSFORM_WIDTH_HEIGHT),
tv.transforms.ToTensor(),
tv.transforms.Normalize(mean=TRANSFORM_MEAN, std=TRANSFORM_STD)
])
#
# Datasets and samplers
#
# Training, validation, and testing datasets
train_dataset = tv.datasets.ImageFolder(IMAGES_DIR + 'train', transform=train_transform)
valid_dataset = tv.datasets.ImageFolder(IMAGES_DIR + 'train', transform=eval_transform)
test_dataset = tv.datasets.ImageFolder(IMAGES_DIR + 'test', transform=eval_transform)
# Get the index to class mapping
classes_dict = train_dataset.classes
# Pick a random subset of the training images for validation
all_indices = list(range(len(train_dataset)))
np.random.shuffle(all_indices)
split = int(np.floor(len(train_dataset) * VALIDATION_FRAC))
valid_idx, train_idx = all_indices[:split], all_indices[split:]
# Define samplers for training and validation
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idx)
valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(valid_idx)
train_dataset_size = len(train_sampler)
valid_dataset_size = len(valid_sampler)
#
# Image loaders
#
loaders_transfer = {
'train': torch.utils.data.DataLoader(
train_dataset,
batch_size = BATCH_SIZE,
sampler = train_sampler,
num_workers = NUM_LOADER_WORKERS
),
'valid': torch.utils.data.DataLoader(
valid_dataset,
batch_size = BATCH_SIZE,
sampler = valid_sampler,
num_workers = NUM_LOADER_WORKERS
),
'test': torch.utils.data.DataLoader(
test_dataset,
batch_size = BATCH_SIZE,
num_workers = NUM_LOADER_WORKERS
),
}
# Print the dataset sizes (rounded up to whole batch) for sanity checking
for name, loader in loaders_transfer.items():
print(name + ":", BATCH_SIZE * len(loader))
train: 4000 valid: 1024 test: 1280
Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_transfer, and fill in the function get_optimizer_transfer below.
## Done: select loss function
criterion_transfer = nn.NLLLoss()
def get_optimizer_transfer(model):
## Done: select and return optimizer
#
# A simple stochastic gradient descent seems to work slightly better than Adam in this case.
#
# Only optimize the `classifier` parameters; the other parameters are fixed.
#
return optim.SGD(model.classifier.parameters(), lr=0.001)
Use transfer learning to create a CNN to classify images of landmarks. Use the code cell below, and save your initialized model as the variable model_transfer.
## Done: Specify model architecture
# Start with a pre-trained VGG19 model with batch normalization.
# Disable gradients for all parameters of the "features" module, as they are not needed.
#
model_transfer = tv.models.vgg19_bn(pretrained=True)
for p in model_transfer.features.parameters():
p.requires_grad = False
# Replace the classifier output layer with a new layer containing the required number of output nodes.
# Use Log-SoftMax as the activation function.
#
model_transfer.classifier[6] = nn.Linear(in_features=4096, out_features=50)
model_transfer.classifier.add_module('7', nn.LogSoftmax(dim=1))
#-#-# Do NOT modify the code below this line. #-#-#
if use_cuda:
model_transfer = model_transfer.cuda()
Question 3: Outline the steps you took to get to your final CNN architecture and your reasoning at each step. Describe why you think the architecture is suitable for the current problem.
Answer:
Different configurations were tested. The following aspects were the same for all of these tests.
The base model is VGG19 with batch normalization. VGG is reasonably sized (not too big), while still achieving very good results on the ImageNet dataset.
The convolutional part of the model is kept static. Because this dataset is small compared to the ImageNet dataset on which VGG was trained, training the convolutional part of the model on this smaller dataset would probably reduce its generalization capabilities.
The following aspects were varied to see which option works best.
The classifier part was modified for this task, either by adding an extra layer, or by replacing the original output layer. All of the classifier layers were trained to make them adapt to the task of classifying the 50 landmarks.
The existing layers of the classifier part were trained either from the pre-trained weights, or from new random weights.
The optimizer used was either Adam with a learning rate of 0.001, or SGD with a learning rate of 0.01.
The following configuration was ultimately chosen because it had the lowest validation loss after 20 epochs:
This configuration is implemented below. A learning rate of 0.01 turned out to cause stagnation when training for longer than 20 epochs, so it was changed back to 0.001.
Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_transfer.pt'.
# Done: train the model and save the best model parameters at filepath 'model_transfer.pt'
EPOCHS = 100
# Keep all components in evaluation mode, except for the classifier.
model_transfer.eval()
optimizer = get_optimizer_transfer(model_transfer)
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf
for epoch in range(1, EPOCHS+1):
train_loss_accum = 0.0
valid_loss_accum = 0.0
# Training loop
#
model_transfer.classifier.train()
for data, target in loaders_transfer['train']:
if use_cuda:
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
log_ps = model_transfer(data)
loss = criterion_transfer(log_ps, target)
train_loss_accum += loss.item() * data.size(0)
loss.backward()
optimizer.step()
# Validation loop
#
model_transfer.classifier.eval()
with torch.no_grad():
for data, target in loaders_transfer['valid']:
if use_cuda:
data, target = data.cuda(), target.cuda()
log_ps = model_transfer(data)
loss = criterion_transfer(log_ps, target)
valid_loss_accum += loss.item() * data.size(0)
# Show status; save parameters if validation loss has decreased.
#
train_loss = train_loss_accum / train_dataset_size
valid_loss = valid_loss_accum / valid_dataset_size
print(f"Epoch: {epoch} \tTraining Loss: {train_loss:.6f} \tValidation Loss: {valid_loss:.6f}")
if valid_loss < valid_loss_min:
print(f'\tValidation loss decreased: ({valid_loss_min:0.6f} -> {valid_loss:0.6f}); saving state.')
torch.save(model_transfer.state_dict(), 'model_transfer.pt')
valid_loss_min = valid_loss
#-#-# Do NOT modify the code below this line. #-#-#
# load the model that got the best validation accuracy
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
Epoch: 1 Training Loss: 3.826577 Validation Loss: 3.605850 Validation loss decreased: (inf -> 3.605850); saving state. Epoch: 2 Training Loss: 3.548114 Validation Loss: 3.271772 Validation loss decreased: (3.605850 -> 3.271772); saving state. Epoch: 3 Training Loss: 3.288854 Validation Loss: 2.979481 Validation loss decreased: (3.271772 -> 2.979481); saving state. Epoch: 4 Training Loss: 3.053479 Validation Loss: 2.718567 Validation loss decreased: (2.979481 -> 2.718567); saving state. Epoch: 5 Training Loss: 2.833574 Validation Loss: 2.486762 Validation loss decreased: (2.718567 -> 2.486762); saving state. Epoch: 6 Training Loss: 2.646282 Validation Loss: 2.284452 Validation loss decreased: (2.486762 -> 2.284452); saving state. Epoch: 7 Training Loss: 2.514046 Validation Loss: 2.112932 Validation loss decreased: (2.284452 -> 2.112932); saving state. Epoch: 8 Training Loss: 2.353958 Validation Loss: 1.964515 Validation loss decreased: (2.112932 -> 1.964515); saving state. Epoch: 9 Training Loss: 2.240612 Validation Loss: 1.840721 Validation loss decreased: (1.964515 -> 1.840721); saving state. Epoch: 10 Training Loss: 2.132809 Validation Loss: 1.741006 Validation loss decreased: (1.840721 -> 1.741006); saving state. Epoch: 11 Training Loss: 2.031072 Validation Loss: 1.656227 Validation loss decreased: (1.741006 -> 1.656227); saving state. Epoch: 12 Training Loss: 1.963305 Validation Loss: 1.589143 Validation loss decreased: (1.656227 -> 1.589143); saving state. Epoch: 13 Training Loss: 1.908278 Validation Loss: 1.531943 Validation loss decreased: (1.589143 -> 1.531943); saving state. Epoch: 14 Training Loss: 1.845024 Validation Loss: 1.481128 Validation loss decreased: (1.531943 -> 1.481128); saving state. Epoch: 15 Training Loss: 1.823711 Validation Loss: 1.437371 Validation loss decreased: (1.481128 -> 1.437371); saving state. Epoch: 16 Training Loss: 1.775975 Validation Loss: 1.400905 Validation loss decreased: (1.437371 -> 1.400905); saving state. Epoch: 17 Training Loss: 1.713129 Validation Loss: 1.372481 Validation loss decreased: (1.400905 -> 1.372481); saving state. Epoch: 18 Training Loss: 1.693004 Validation Loss: 1.342107 Validation loss decreased: (1.372481 -> 1.342107); saving state. Epoch: 19 Training Loss: 1.644401 Validation Loss: 1.320925 Validation loss decreased: (1.342107 -> 1.320925); saving state. Epoch: 20 Training Loss: 1.622468 Validation Loss: 1.298481 Validation loss decreased: (1.320925 -> 1.298481); saving state. Epoch: 21 Training Loss: 1.588096 Validation Loss: 1.278297 Validation loss decreased: (1.298481 -> 1.278297); saving state. Epoch: 22 Training Loss: 1.565242 Validation Loss: 1.259957 Validation loss decreased: (1.278297 -> 1.259957); saving state. Epoch: 23 Training Loss: 1.540542 Validation Loss: 1.241704 Validation loss decreased: (1.259957 -> 1.241704); saving state. Epoch: 24 Training Loss: 1.512054 Validation Loss: 1.223456 Validation loss decreased: (1.241704 -> 1.223456); saving state. Epoch: 25 Training Loss: 1.504898 Validation Loss: 1.211027 Validation loss decreased: (1.223456 -> 1.211027); saving state. Epoch: 26 Training Loss: 1.494411 Validation Loss: 1.206707 Validation loss decreased: (1.211027 -> 1.206707); saving state. Epoch: 27 Training Loss: 1.490047 Validation Loss: 1.185925 Validation loss decreased: (1.206707 -> 1.185925); saving state. Epoch: 28 Training Loss: 1.461151 Validation Loss: 1.177694 Validation loss decreased: (1.185925 -> 1.177694); saving state. Epoch: 29 Training Loss: 1.457599 Validation Loss: 1.163164 Validation loss decreased: (1.177694 -> 1.163164); saving state. Epoch: 30 Training Loss: 1.419118 Validation Loss: 1.160278 Validation loss decreased: (1.163164 -> 1.160278); saving state. Epoch: 31 Training Loss: 1.429916 Validation Loss: 1.148867 Validation loss decreased: (1.160278 -> 1.148867); saving state. Epoch: 32 Training Loss: 1.405443 Validation Loss: 1.138523 Validation loss decreased: (1.148867 -> 1.138523); saving state. Epoch: 33 Training Loss: 1.397361 Validation Loss: 1.131755 Validation loss decreased: (1.138523 -> 1.131755); saving state. Epoch: 34 Training Loss: 1.387947 Validation Loss: 1.120108 Validation loss decreased: (1.131755 -> 1.120108); saving state. Epoch: 35 Training Loss: 1.369392 Validation Loss: 1.121956 Epoch: 36 Training Loss: 1.357100 Validation Loss: 1.111512 Validation loss decreased: (1.120108 -> 1.111512); saving state. Epoch: 37 Training Loss: 1.372041 Validation Loss: 1.107094 Validation loss decreased: (1.111512 -> 1.107094); saving state. Epoch: 38 Training Loss: 1.344225 Validation Loss: 1.093661 Validation loss decreased: (1.107094 -> 1.093661); saving state. Epoch: 39 Training Loss: 1.344559 Validation Loss: 1.092576 Validation loss decreased: (1.093661 -> 1.092576); saving state. Epoch: 40 Training Loss: 1.334872 Validation Loss: 1.090579 Validation loss decreased: (1.092576 -> 1.090579); saving state. Epoch: 41 Training Loss: 1.331378 Validation Loss: 1.078005 Validation loss decreased: (1.090579 -> 1.078005); saving state. Epoch: 42 Training Loss: 1.289692 Validation Loss: 1.077186 Validation loss decreased: (1.078005 -> 1.077186); saving state. Epoch: 43 Training Loss: 1.339220 Validation Loss: 1.076965 Validation loss decreased: (1.077186 -> 1.076965); saving state. Epoch: 44 Training Loss: 1.320309 Validation Loss: 1.063355 Validation loss decreased: (1.076965 -> 1.063355); saving state. Epoch: 45 Training Loss: 1.287839 Validation Loss: 1.062626 Validation loss decreased: (1.063355 -> 1.062626); saving state. Epoch: 46 Training Loss: 1.287450 Validation Loss: 1.053796 Validation loss decreased: (1.062626 -> 1.053796); saving state. Epoch: 47 Training Loss: 1.281817 Validation Loss: 1.049081 Validation loss decreased: (1.053796 -> 1.049081); saving state. Epoch: 48 Training Loss: 1.268819 Validation Loss: 1.053690 Epoch: 49 Training Loss: 1.265411 Validation Loss: 1.047209 Validation loss decreased: (1.049081 -> 1.047209); saving state. Epoch: 50 Training Loss: 1.271733 Validation Loss: 1.042309 Validation loss decreased: (1.047209 -> 1.042309); saving state. Epoch: 51 Training Loss: 1.242407 Validation Loss: 1.035257 Validation loss decreased: (1.042309 -> 1.035257); saving state. Epoch: 52 Training Loss: 1.268359 Validation Loss: 1.033097 Validation loss decreased: (1.035257 -> 1.033097); saving state. Epoch: 53 Training Loss: 1.257992 Validation Loss: 1.030645 Validation loss decreased: (1.033097 -> 1.030645); saving state. Epoch: 54 Training Loss: 1.255630 Validation Loss: 1.029428 Validation loss decreased: (1.030645 -> 1.029428); saving state. Epoch: 55 Training Loss: 1.224352 Validation Loss: 1.030273 Epoch: 56 Training Loss: 1.239602 Validation Loss: 1.025109 Validation loss decreased: (1.029428 -> 1.025109); saving state. Epoch: 57 Training Loss: 1.193175 Validation Loss: 1.018310 Validation loss decreased: (1.025109 -> 1.018310); saving state. Epoch: 58 Training Loss: 1.210981 Validation Loss: 1.020108 Epoch: 59 Training Loss: 1.197562 Validation Loss: 1.015377 Validation loss decreased: (1.018310 -> 1.015377); saving state. Epoch: 60 Training Loss: 1.187691 Validation Loss: 1.010239 Validation loss decreased: (1.015377 -> 1.010239); saving state. Epoch: 61 Training Loss: 1.209084 Validation Loss: 1.009348 Validation loss decreased: (1.010239 -> 1.009348); saving state. Epoch: 62 Training Loss: 1.177366 Validation Loss: 1.002029 Validation loss decreased: (1.009348 -> 1.002029); saving state. Epoch: 63 Training Loss: 1.172128 Validation Loss: 1.003834 Epoch: 64 Training Loss: 1.185489 Validation Loss: 1.000050 Validation loss decreased: (1.002029 -> 1.000050); saving state. Epoch: 65 Training Loss: 1.169609 Validation Loss: 0.997659 Validation loss decreased: (1.000050 -> 0.997659); saving state. Epoch: 66 Training Loss: 1.164588 Validation Loss: 0.989408 Validation loss decreased: (0.997659 -> 0.989408); saving state. Epoch: 67 Training Loss: 1.155335 Validation Loss: 0.990323 Epoch: 68 Training Loss: 1.181692 Validation Loss: 0.984899 Validation loss decreased: (0.989408 -> 0.984899); saving state. Epoch: 69 Training Loss: 1.169353 Validation Loss: 0.986261 Epoch: 70 Training Loss: 1.164554 Validation Loss: 0.984304 Validation loss decreased: (0.984899 -> 0.984304); saving state. Epoch: 71 Training Loss: 1.139905 Validation Loss: 0.976084 Validation loss decreased: (0.984304 -> 0.976084); saving state. Epoch: 72 Training Loss: 1.161235 Validation Loss: 0.979935 Epoch: 73 Training Loss: 1.158111 Validation Loss: 0.978781 Epoch: 74 Training Loss: 1.147483 Validation Loss: 0.975266 Validation loss decreased: (0.976084 -> 0.975266); saving state. Epoch: 75 Training Loss: 1.132384 Validation Loss: 0.980433 Epoch: 76 Training Loss: 1.162953 Validation Loss: 0.977481 Epoch: 77 Training Loss: 1.138564 Validation Loss: 0.969854 Validation loss decreased: (0.975266 -> 0.969854); saving state. Epoch: 78 Training Loss: 1.140162 Validation Loss: 0.968353 Validation loss decreased: (0.969854 -> 0.968353); saving state. Epoch: 79 Training Loss: 1.107031 Validation Loss: 0.966810 Validation loss decreased: (0.968353 -> 0.966810); saving state. Epoch: 80 Training Loss: 1.119824 Validation Loss: 0.967031 Epoch: 81 Training Loss: 1.104868 Validation Loss: 0.966301 Validation loss decreased: (0.966810 -> 0.966301); saving state. Epoch: 82 Training Loss: 1.119196 Validation Loss: 0.964367 Validation loss decreased: (0.966301 -> 0.964367); saving state. Epoch: 83 Training Loss: 1.098076 Validation Loss: 0.960549 Validation loss decreased: (0.964367 -> 0.960549); saving state. Epoch: 84 Training Loss: 1.103673 Validation Loss: 0.957701 Validation loss decreased: (0.960549 -> 0.957701); saving state. Epoch: 85 Training Loss: 1.078951 Validation Loss: 0.957289 Validation loss decreased: (0.957701 -> 0.957289); saving state. Epoch: 86 Training Loss: 1.102185 Validation Loss: 0.955771 Validation loss decreased: (0.957289 -> 0.955771); saving state. Epoch: 87 Training Loss: 1.111377 Validation Loss: 0.958069 Epoch: 88 Training Loss: 1.069585 Validation Loss: 0.950113 Validation loss decreased: (0.955771 -> 0.950113); saving state. Epoch: 89 Training Loss: 1.061395 Validation Loss: 0.955700 Epoch: 90 Training Loss: 1.070746 Validation Loss: 0.958298 Epoch: 91 Training Loss: 1.083173 Validation Loss: 0.949416 Validation loss decreased: (0.950113 -> 0.949416); saving state. Epoch: 92 Training Loss: 1.064583 Validation Loss: 0.949253 Validation loss decreased: (0.949416 -> 0.949253); saving state. Epoch: 93 Training Loss: 1.066728 Validation Loss: 0.946852 Validation loss decreased: (0.949253 -> 0.946852); saving state. Epoch: 94 Training Loss: 1.088399 Validation Loss: 0.944767 Validation loss decreased: (0.946852 -> 0.944767); saving state. Epoch: 95 Training Loss: 1.080135 Validation Loss: 0.943579 Validation loss decreased: (0.944767 -> 0.943579); saving state. Epoch: 96 Training Loss: 1.069651 Validation Loss: 0.943894 Epoch: 97 Training Loss: 1.068943 Validation Loss: 0.942383 Validation loss decreased: (0.943579 -> 0.942383); saving state. Epoch: 98 Training Loss: 1.044672 Validation Loss: 0.939507 Validation loss decreased: (0.942383 -> 0.939507); saving state. Epoch: 99 Training Loss: 1.037374 Validation Loss: 0.941248 Epoch: 100 Training Loss: 1.048047 Validation Loss: 0.937920 Validation loss decreased: (0.939507 -> 0.937920); saving state.
<All keys matched successfully>
Try out your model on the test dataset of landmark images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 60%.
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 0.834709 Test Accuracy: 77% (974/1250)
Great job creating your CNN models! Now that you have put in all the hard work of creating accurate classifiers, let's define some functions to make it easy for others to use your classifiers.
Implement the function predict_landmarks, which accepts a file path to an image and an integer k, and then predicts the top k most likely landmarks. You are required to use your transfer learned CNN from Step 2 to predict the landmarks.
An example of the expected behavior of predict_landmarks:
>>> predicted_landmarks = predict_landmarks('example_image.jpg', 3)
>>> print(predicted_landmarks)
['Golden Gate Bridge', 'Brooklyn Bridge', 'Sydney Harbour Bridge']
import cv2
from PIL import Image
## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)
def predict_landmarks(img_path, k):
## Done: return the names of the top k landmarks predicted by the transfer learned CNN
with Image.open(img_path) as pil_image:
# Crop to a square, and resize to target size + margin
crop_size = min(pil_image.width, pil_image.height)
scale_size = TRANSFORM_WIDTH_HEIGHT + DISCARD_BORDER * 2
x_margin = (pil_image.width - crop_size) / 2
y_margin = (pil_image.height - crop_size) / 2
pil_image = pil_image.resize(
size = (scale_size, scale_size),
box = (
x_margin,
y_margin,
x_margin + crop_size,
y_margin + crop_size
),
resample = Image.BICUBIC
)
# Crop to target size
pil_image = pil_image.crop((
DISCARD_BORDER,
DISCARD_BORDER,
DISCARD_BORDER + TRANSFORM_WIDTH_HEIGHT,
DISCARD_BORDER + TRANSFORM_WIDTH_HEIGHT
))
# Convert to NumPy array of floats in range [0.0, 1.0]
np_image = np.array(pil_image) / 255.0
# Normalize the colors
np_image_n = (np_image - TRANSFORM_MEAN) / TRANSFORM_STD
# Convert to tensor; rearrange the dimensions into (color, x, y), and add a batch dimension
tensor = torch.Tensor(np_image_n.transpose((2, 0, 1))).unsqueeze(0)
# Get class probabilities from the model
if use_cuda:
tensor = tensor.cuda()
model_transfer.eval()
with torch.no_grad():
log_ps = model_transfer(tensor).cpu()
# Sort the class probabilities by likelihood
# (no need to convert the log-probabilities back; doesn't affect the ordering)
predictions = reversed(sorted(zip(log_ps.view(-1).tolist(), train_dataset.classes)))
_, ranked = zip(*predictions)
# Return a list of formatted names; remove the class ID, and replace all "_"s with spaces
return [
name.split('.')[1].replace("_", " ")
for name in ranked[:k]
]
# test on a sample image
predict_landmarks('images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg', 5)
['Golden Gate Bridge', 'Forth Bridge', 'Brooklyn Bridge', 'Sydney Harbour Bridge', 'Niagara Falls']
In the code cell below, implement the function suggest_locations, which accepts a file path to an image as input, and then displays the image and the top 3 most likely landmarks as predicted by predict_landmarks.
Some sample output for suggest_locations is provided below, but feel free to design your own user experience!

def suggest_locations(img_path):
# get landmark predictions
predicted_landmarks = predict_landmarks(img_path, 3)
## Done: display image and display landmark predictions
# Show the image
with Image.open(img_path) as pil_image:
# Convert to NumPy array of floats in range [0.0, 1.0]
np_image = np.array(pil_image) / 255.0
figure, ax = plt.subplots(1, 1, figsize=(12,8))
ax.imshow(np_image)
ax.set_title(os.path.split(img_path)[1])
ax.set_xticks(())
ax.set_yticks(())
# Show the predicted locations
figure.text(
x = 0.5,
y = 0,
fontsize = 'xx-large',
horizontalalignment = 'center',
s = "Was this picture taken at the\n{}, {}, or {}?"
.format(*predicted_landmarks)
)
# test on a sample image
suggest_locations('images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg')
Test your algorithm by running the suggest_locations function on at least four images on your computer. Feel free to use any images you like.
Question 4: Is the output better than you expected :) ? Or worse :( ? Provide at least three possible points of improvement for your algorithm.
Answer:
Due to time constraints, I decided to pick one base model for transfer learning (VGG19). Other models, such as ResNet, may perform better.
Having a bigger dataset to train on always helps. If this algorithm were implemented in a real photo album service, I would ask users for consent to use their uploaded photos (and any assigned tags) for training the model further, so that it gets better over time.
There are a lot of pictures in the dataset that don't show any distinctive features of a landmark. (People in front of a brick wall, a picture of common plants or animals (not unique to that area), a nondescript stretch of road, and so on.) An idea might be to use the trained model to evaluate each image, and filter out those with weak predictions (all class probabilities near 0.02). Then train the model again on the remaining images. This may improve the model for images that do show a distinctive feature of a landmark, because of the reduced 'noise' in the training data.
## Done: Execute the `suggest_locations` function on
## at least 4 images on your computer.
## Feel free to use as many code cells as needed.
# https://en.wikipedia.org/wiki/Delicate_Arch#/media/File:Cloudy_Sunset_at_Delicate_Arch_(8520706358).jpg
suggest_locations('images/Wikipedia/Cloudy_Sunset_at_Delicate_Arch_(8520706358).jpg')
# https://en.wikipedia.org/wiki/Wroc%C5%82aw%27s_dwarfs#/media/File:Ossolinek_(Ossolineo)_Wroclaw_dwarf_dressed_2016_P01.jpg
suggest_locations('images/Wikipedia/Ossolinek_(Ossolineo)_Wroclaw_dwarf_dressed_2016_P01.jpg')
# https://en.wikipedia.org/wiki/Hanging_Temple#/media/File:HangingMonasterySculptures.jpg
suggest_locations('images/Wikipedia/HangingMonasterySculptures.jpg')
# https://en.wikipedia.org/wiki/Haleakal%C4%81_National_Park#/media/File:Haleakala_Observatory_2017.jpg
suggest_locations('images/Wikipedia/1920px-Haleakala_Observatory_2017.jpg')
This is an implementation of the second optional goal in the project rubric. The node values of the penultimate layer in the model are used to compute a digest (digital fingerprint) of each image in the dataset. These digests can then be used to find similar images, as demonstrated in the last code cell.
The digests are compared by performing a dot product. More similar images result in a higher dot product.
from pathlib import Path
# Run the model on an input image, and return the result from the last hidden layer
# for use as a content digest.
#
def feature_hash(img_path):
with Image.open(img_path) as pil_image:
# Crop to a square, and resize to target size + margin
scale_size = TRANSFORM_WIDTH_HEIGHT + DISCARD_BORDER * 2
crop_size = min(pil_image.width, pil_image.height)
sys.stdout.flush()
x_margin = (pil_image.width - crop_size) / 2
y_margin = (pil_image.height - crop_size) / 2
pil_image = pil_image.resize(
size = (scale_size, scale_size),
box = (
x_margin,
y_margin,
x_margin + crop_size,
y_margin + crop_size
),
resample = Image.BICUBIC
)
# Crop to target size
pil_image = pil_image.crop((
DISCARD_BORDER,
DISCARD_BORDER,
DISCARD_BORDER + TRANSFORM_WIDTH_HEIGHT,
DISCARD_BORDER + TRANSFORM_WIDTH_HEIGHT
))
# Convert to NumPy array of floats in range [0.0, 1.0]
np_image = np.array(pil_image) / 255.0
# Normalize the colors
np_image_n = (np_image - TRANSFORM_MEAN) / TRANSFORM_STD
# Convert to tensor; rearrange the dimensions into (color, x, y), and add a batch dimension
tensor = torch.Tensor(np_image_n.transpose((2, 0, 1))).unsqueeze(0)
# Run the model without the final layer
if use_cuda:
tensor = tensor.cuda()
model_transfer.eval()
with torch.no_grad():
x = model_transfer.features(tensor)
x = model_transfer.avgpool(x)
x = x.view(1, -1)
for i in range(4):
x = model_transfer.classifier[i](x)
# Convert the image to a NumPy array, and return it.
return x.cpu().view(-1).numpy()
# Return the relative path of each JPEG image in a folder (including sub-folders)
#
def iter_images(data_path):
if data_path.is_dir():
for item in data_path.iterdir():
yield from iter_images(item)
elif data_path.is_file() and data_path.suffix.lower() in ('.jpg', '.jpeg'):
yield data_path
# Build a name-to-digest mapping of all images in the dataset
#
feature_digests = {
img_path: feature_hash(str(img_path))
for img_path in iter_images(Path(IMAGES_DIR))
}
#
# Find images similar to this one:
#
test_image_path = 'images/test/24.Soreq_Cave/18dbbad48a83a742.jpg'
# Function for plotting an image and caption onto a provided PyPlot Axes.
#
def plot_image(ax, img_path, caption):
with Image.open(img_path) as pil_image:
# Convert to NumPy array of floats in range [0.0, 1.0]
np_image = np.array(pil_image) / 255.0
ax.imshow(np_image)
ax.set_title(caption)
ax.set_xticks(())
ax.set_yticks(())
# Show the input image
#
figure, ax = plt.subplots(1, 1, figsize=(16,8))
plot_image(ax, test_image_path, "Input image:" + os.path.split(test_image_path)[1])
# Compute the dot product between the digest of the target image and the digest of each dataset image.
# Sort the outcomes by value; the images with the highest values are the most similar to the input image.
#
target_digest = feature_hash(test_image_path)
match_levels = [
(target_digest.dot(digest), img_path)
for img_path, digest in feature_digests.items()
]
ranked_matches = reversed(sorted(match_levels))
# Show the five best matches.
#
print("Best matches:")
figure, axes = plt.subplots(1, 5, figsize=(16,8))
for i, (ax, match) in enumerate(zip(axes, ranked_matches), start=1):
img_path = match[1]
print(f"{i}: {img_path}")
plot_image(ax, img_path, f"match {i}")
Best matches: 1: ../data/landmark_images/test/24.Soreq_Cave/18dbbad48a83a742.jpg 2: ../data/landmark_images/train/24.Soreq_Cave/00e1c553eec4fc70.jpg 3: ../data/landmark_images/train/24.Soreq_Cave/644d523bef46fee9.jpg 4: ../data/landmark_images/train/24.Soreq_Cave/3b1d976ecf196df1.jpg 5: ../data/landmark_images/train/24.Soreq_Cave/50f037baea03e9aa.jpg